INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE 



OC 

C 

c 

(N 



The stable configuration in 
acyclic preference-based systems 

Fabien Mathieu — Gheorghe Postelnicu — Julien Reynier 



c/: 



> 

m 
cn 

OC 
C 

OC 

c 



X 

TO 



N° 6628 

September 2008 
Theme NUM _ 




ROCOUENCOURT 



The stable configuration in 
acyclic preference-based systems 

Fabien MathietEl , Gheorghe Postelniciil , Julien ReynieiH 

Theme NUM — Systemes numeriques 
Projets GANG 

Rapport de recherche n° 6628 — September 2008 — [23l pages 



Abstract: AcycHc preferences recently appeared as an elegant way to model many distributed systems. An 
acyclic instance admits a unique stable configuration, which can reveal the performance of the system. In 
this paper, we give the statistical properties of the stable configuration for three classes of acyclic preferences: 
node-based preferences, distance-based preferences, and random acyclic systems. Using random overlay graphs, 
we prove using mean-field and fiuid-limit techniques that these systems have an asymptotically continuous 
independent rank distribution for a proper scaling, and the analytical solution is compared to simulations. 
These results provide a theoretical ground for validating the performance of bandwidth-based or proximity- 
based unstructured systems. 
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Configuration stable des systemes a preferences acycliques 



Resume : Les systemes a preferences acycliques sont recemment apparus comme une methode elegante 
dc modclisation dc certains systemes ditribucs de type pair-a-pair. Une instance acyclique admet une unique 
configuration stable, auto-stabilisante, qui donne une bonne indication du comportement du systeme. Dans 
ce rapport, nous donnons la distribution statistique de la configuration stable pour trois types de preferences 
acycliques : les preferences globales (basees sur un ordre total des nceuds), les preferences de distance (le plus 
proche est prefere), et les preferences acycliques aleatoires. Sous Thypothese d'un graphe de compatibilite Erdos- 
Renyi, nous montrons a I'aide de techniques de limites fluides et de champ moyen I'existence d'une distribution 
limite continue. La pertinence des resultats est verifiee a I'aide de simulations. 

Mots-cles : Systemes acycliques, distribution, limite fluid, champ moyen, petit-mondes, EDP 
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1 Introduction 

Matching problems with preferences have applications in a variety of real-world situations, including dating 
agencies, college admissions, roommate attributions, assignment of graduating medical students to their first 
hospital appointment, or kidney exchanges programs [SlIOl flO t [19 1 [20] . 

Recently, matching problems also appeared as an elegant way to model many distributed systems, including 
ad-hoc and peer-to-peer networks [T2 l [6t [TB I [T5 j [7] . In distributed systems, the preferences generally come from 
direct measurements. Those measurements can be node-related (CPU, upload/ download bandwidths, storage, 
battery, uptime), or edge-related (Round- Trip Time, physical/ virtual distances, Hnk capacity, co-uptime). In 
most cases, the resulting preferences are acycHc: there cannot exist a cycle of more than two nodes such that each 
node prefers its successor to its predecessor. As a consequence, there always exists a unique stable configuration, 
which is self-stabilizing [6l[T]. This makes things much easier than in other matching problems, where finding, 
counting and comparing the stable configurations are some of the main issues [8t [2T | \T7 \ I20j. 

ModeHng distributed systems with acycHc preferences allows us to predict the effective collaborations that 
will occur, which, in turn, allows us to infer the performance of a given system. For convenience, the study of 
an acycHc distributed system is often split into two main problems: 

• How fast is the stabilization process? Because distributed systems are often highly dynamic, with constant 
churn and preference alteration, the speed of convergence can be used to determine how far the effective 
configurations are from the time-evolving stable configuration. 

• What are the properties of the stable configuration? If the stabilization process is fast enough, the effective 
and stable configurations will be close. Analyzing the latter can then give valuable information on the 
former. 

In a previous work, Mathieu investigated the first question [HI [15]. He proved that even if the convergence 
can be prohibitive under an adversary scheduler, it is fast for reaHstic scenarios. The second question has 
been answered for specific acyclic preferences: for real-world latency-based preferences, the stable configura- 
tion shows, for 6-matching (several mates per nodes allowed), small- world properties (low diameter and high 
clustering coefficient) [6]; for node-related preferences, the stable configuration tends to pair nodes with similar 
values [7]: this is the stratification effect, which allows, for instance, to understand upload/ download correlations 
in incentive networks like BitTorrent [Ij. 

1.1 Contribution 

The studies proposed in [7j and ^ gave only partial, mostly empirical, answers about the link distribution in the 
stable configuration, and proposed some conjectures. The goal of this paper is to complete and give theoretical 
proofs on the shape of the stable configuration. 

We extend the seminal results that were given in for node-based preferences: for 6 = 1 (simple matching 
case) , we prove the existence of a limit continuous distribution and solve the corresponding Partial Differential 
Equation (PDE). Then we apply a similar method for distance-based and random-acycHc preferences, and also 
give the explicit solution of the corresponding PDE. 

Lastly, we extend the results for 6 > 1 (multiple matchings). In that case, there is no simple expression that 
gives the exact solution of the PDEs system, but discrete equations are used to observe asymptotical behavior 
of the distribution. For node-based preferences, the exponential behavior validates the stratification effect (the 
probability to be matched with a distant peer decreases exponentially with the distance), while the power law 
obtained for the two other cases indicates that the small world effect observed in |6] for latency is in fact common 
to all distance-based preference|3- 

1.2 Roadmap 

In Section [2] we define the model and notation for preference-based systems. Section [3] gives the generic mean 
field method used in this paper to solve the simple matching case. The case of node-based preferences is solved 
in Section [H then the results are adapted to the distance-based and random-acyclic preferences in Section [5l 
Section [6] extend the formulas to multiple matchings, and asymptotical properties of the distributions are 
described. Lastly, Section [7] concludes. 

^Latencies cannot be considered as real distances, mainly because the triangular inequality is not always verified. However, they 
form an inframetric, which is no too far from a real metric [13) . 
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2 Model and notation 

A preference-based system is a set V oi N nodes, whose possible interactions are described by an acceptance 
graph G, a mark matrix m and a quota vector 6. 

The quota vector b limits the collaborations: a peer i cannot have more b{i) simultaneous mates. 

The acceptance graph G — (V, E) is an undirected, non-reflexive graph. It describes allowed matchings: 
a node i and a node j can be mated (we say that i is acceptable for j, and vice versa) if, and only if (iff) 
G E. For instance, in peer-to-peer networks, a node cannot be directly connected to all other peers of the 
system, because of scalability, and peers that are not directly connected cannot be mated. In this paper, we 
consider Erdos-Renyi graphs Q{N,p) (each possible edge exists with probability p independently of the others; 
hence the expected degree is d = p{N — 1)). 

The mark matrix m is used to construct the peers' preferences: given two nodes j and k acceptable for i, i 
ranks j better than k iff mi,j < ^ (the sign is arbitrary). The following marks are considered in this paper: 

Node-based m{.,i) is constant (nodes have intrinsic values). These preferences are suited to modeling peer- 
related performance, Hke access bandwidth, storage, CPU, uptime. . . 

Geometric the nodes are associated to N points picked uniformly at random on a n-dimensional torus 
(n > 1). The marks are the distances between those points. These preferences allow a theoretical analysis of 
proximity-based performance. 

Meridian latencies we considered random subsets of N nodes taken from the 2500 nodes dataset of the 
Meridian Project [18J. The marks are the (symmetric) latencies between those nodes. We do not perform 
analysis for those marks, but use them in § 16.3.21 for validating the geometric approach. 

Random acyclic each edge receives a random uniformly distributed value. The name is justifled because all 
acyclic preferences can be described by marks on the edges (which is equivalent to assume that m is symmetric) . 
Hence uniformly distributed (symmetric) random marks are a convenient way to perform a uniform sampling 
of the acycHc preferences [HI [T] . 

All the considered marks are acyclic, and therefore a (G, m, b) system admits a unique stable conflguration 
C € E, which is self-stabilizing |12[ [16]. The neighbors of i in G are the stable mates of i, and the notation 
i ^ j is used to express that i and j are stable mates. 

We assume for simplicity that m is complete and not Hmited to the edges of Q. For all considered preferences 
but random acycHc, the completion is straightforward. For random acycHc preferences, we assume that dummy 
random values are assigned to non-acceptable edges. 

The preferences are denoted like follows: if j is acceptable for i, ri{j) denotes the rank of j in i's list (1 
being the best), vt is called the acceptable ranking of i. If i has more than k acceptable neighbors, r~^{k) is 
the fc*'* node in i's acceptable ranking. Similarly, for j ^ i, Ri{j) denotes the rank of j in the complete graph 
(the acceptability condition is omitted). Ri is called the complete ranking of i For K < N, R^^{K) is the K^^ 
node in i's complete ranking. 

All stable mating probabilities that are discussed in this article are designed by D. Subscripts and arguments 
are used to precise the meaning of D whenever needed. For instance: 

• Db.. (K) is the probability that i has a stable mate with complete rank K. 

• E)M,d{i,j) is the probability that i ^ j, knowing there is N nodes and that the expected degree of the 
acceptance graph is d. 

• for c < fe(j, ), Dri^c{k) is the probability that the c*'' stable mate of i has relative rank k. 

• ... 

The complementary cumulative distribution function (CCDF) of D is denoted S, and the scaled version of D 
and S are denoted V and S. 

3 Acyclic formulas 

We flrst consider the case 6=1 (simple matching) (the results will be extended to multiple matchings in 
Section [6]). We give a generic formula that describes the complete rank of the mate C{i) of a peer i. 
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3.1 Generic formula 

Let Dji.{K) be the probabiHty that Ri{C[i)) = K (the probabiHty that the mate of z, if any, has rank K). 
The CCDF of D is Sr-{K) 1 — X]l=i^i which is the probabiHty that z's mate has a rank greater than K 
{Ri{C{i)) > K) or has no mate (short notation: Ri{C[i)) K). Following the approach proposed in ^J, we 
first give a generic exact formula that describes Dr^ , then we propose a simplified mean-field approximation. 
In order to solve Dr.{K), one can observe that i is mated with its K*'^ peer j — R~^{K) iff: 

• {«, j} is an edge of the acceptance graph; this happens with probability p as G is supposed to be a Q{N,p) 
graph. 

• z is not mated with a node better than j {Ri{C{i)) K)\ 

• i is not mated with a node better than i {Rj{C{j)) ^ Rj{i)). 

This leads to the following exact formula: 

DR^iK) =p¥{R,{C{i)) ^K)x 

xV{R,{C{j)) ft Rj{i)\R,{C{i)) it K) (1) 
= pSR^{K)nRj{C{j)) ^ R,(i)mC{i)) it K) 



3.2 Mean-field approximation 

Solving 111) is difficult to handle, mainly because of possible correlations between Rj{C{j)) <fi Rj{i) and 
Ri{C(i)) K. The solution is to adopt a mean field assumption: 

Assumption 1 The events node i is not with a node better than j and node j is not with a node better than 
i are independent. 

This assumption has been proposed in [7] to solve JT]) in the case of node-based preferences. It is reasonable 
when N is large and p is small. Then |[T]) can be approximated by 

DrXK)^pSrXK)Sr^{R,{i)). (2) 

Now, in the next two sections, we propose to solve Equation [2] for specific preferences. 



4 Node-based preferences 

We assume here that the preferences comes from marks on nodes. This is equivalent to assume a total order 
among the nodes. Therefore we do not need to explicit the mark matrix m, and we can use an ordered node 
labeling instead. We arbitrary choose 1, . . . , iV as labels, 1 been the best (if 1 is ranked first for all nodes that 
accept 1, and so on. . . ). 

Because the nodes' label express their complete ranks, we can directly consider D{i,j), the probability that 
node i is mated with node j. Node j has rank j for i if j < i, and j — 1 if j > i, because a rank does not rank 
itself. This gives the relation between D and Dr: 

( DrXj) if i < i, 

= I if j = z (mating is not reflexive), (3) 
[ DR^{j-l)\i3>i. 

Using the CCDF S{i,i) := 1 - D{i,k), we get the node-based version of Equation [H 

D{i 7) = I = (4) 

[ pS{i, j)S{j,i) otherwise. 

This equation, which was originally proposed in [7], which also show that it gives a very good approximation of 
empirical distribution. It can be numerically solved by using a double iteration. 
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4.1 Fluid limit 

Our main contribution for node-based preferences is to prove that, under a constant degree scaling, D admits 
a fluid limit. This limit gives a complete description of D that can be applied to all values of N and p, while 
Equation ([4]) needs to be solved for each set of parameters. 

4.2 Constant degree scaling 

In order to compare the distributions for arbitrary values of iV, we need a scaled version of D, where a peer i 
is represented by a scaled ranking < a < 1. In details, we associate to each i the number a{i) = and to 
each real number alpha the node i{a) — [A^aJ + 1. The scaled version of D, denoted V, is then deflned by 

PAr(a,/3) = ND{[Na\ + 1, [N I3\ + 1). 

Visi is a piecewise constant function. Its set of function values is the set of the {ND{i,j)) values. The factor 
N in its deflnition allows to express D{i,j) as an integral of V: 

rir i — 1 r-k j — 1 

D{i,j)= / ^ 2?iv(— rp,a;)dx = j ^^^Vn{x, ——) dx 

The scaling of the CCDF is defined by 

5Ar(a, /3) = 1— / VN{a,x)dx, (5) 
Jo 

and the relation between S and S is 

Si^,J)^Si^,l^). (6) 

4.3 Convergence theorem 

We now want to show the existence of a continuous limit for V. The problem is the existence of a discontinuity 
for a ~ P, because D{i,i) = 0. However, this discontinuity is just a reminder of the fact that a node cannot 
mate with itself, so we propose to make V more "continuous" by introducing 



P(a,/3) 



V{a, (3) if [Na\ ^ [NI3\ , 

Np{S{[Na\ + 1, [Na\ + \)f otherwise. 



The fluid limit of V is then given by the following theorem: 
Theorem 1 Let d > be a constant. If N —> oo with p = the function 'DN,d uniformly converge towards 



This result indicates that asymptotically, the average degree in the acceptance graph completely defines the 

mating distribution. The consequence is that we can expHcitly describe the so-called stratification effect [7j: 

the mating distribution is exponentially decreasing with |/3 — a|, with intensity d. In other words, a peer with 

scaled rank a tends to mate with a mate of same scaled rank, with a standard deviation of the same order than 
1 

d- 

The proof of Theorem [T] is given in Appendix El Note, that the existence of a fiuid Hmit was proposed 
as a conjecture in [7j, and proved for a = (but the expression of the fiuid Hmit in the general case was not 
provided) . 

Theorem [1] gives two corollaries: 

• using the CCDF of Poo, the probability that a node of scaled rank a has no mate is ]^^g-da(e-d_i) ; 



• for i ^ j (discrete case), a good approximation for D{i,j) is 

p(,P{\3-i\ 
(1 — Q-pmin{i,j) _|_ gp|j-i|) 
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4.4 Validation 

We compared our fluid limit approximation, given by ([8]), to the mean-field values given by (JH), which are known 
to be accurate ([T])- 

N was set to 50 or 2000, and d to 5 or 30. Because D is 2-dimensional, we arbitrary set the scaled rank a 
to 0.1 or 0.9 (but the convergence validation holds for any a). The results are shown in Figure [H 

We observe a gap for j ~ [Na\ + 1, because the mean field formula sets I? to whereas the fiuid limit uses 
a continuous extension. 

Besides this gap, = 50 (Figures [Ta] and [Tb|) shows some difference between the mean field and the fiuid 
limit. The error is especially noticeable for d — 30 (|lbp . However For N ~ 2000 (Figures [Tel and [Tdl) . there is 
practically no error. 

These results are consistent with l(29|) (in the AppendixE]), which shows that the convergence is ©(^e***). 
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Figure 1: Validation of the fluid limit for node-based preferences. 



4.5 Exact resolution 

For the record, if 6 = 1, there exists an exact recursive formula for the node-based stable conflguration. This 
formula is 

D{i,j)= il-S{l,i))D{t~2,j~2) 

+ {S{l,i + l)-S{l,j})D{t^l,j~2) foiKj, (9) 

with the border conditions D{1, k) = p{l - p)^^"^ (fc > 2), D{i, i) = 0. 
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This equation also admits a fluid limit, which happens to be the same than the fluid limit of the mean fleld 
formula. This result appears as a strong validation of the mean field approach: although the mean field formula 
is not exact (its results differ from the exact formula), its fiuid limit is exact. 

One could wonder why using a mean field formula if a usable exact formula exists. The issue with the exact 
formula is that it relies on a "trick": if you remove node 1 and its mate from the system, the remaining nodes 
still form a preference-based system with same parameters except there is two less nodes. However, this trick 
cannot be generalized for other preferences or for 6 > 1. This is why we focus on the mean field formulas. 

A complete proof of the exact recursive formula, including its PDE counterpart and resolution, can be found 
in Appendix [BJ 

5 Acyclic and distance-based preferences 

We now consider geometric and random acyclic preferences. Following the approach used for node-based pref- 
erences, we first focus on the complete rank distribution. Use mean field assumptions, we propose a recursive 
formula for D, then we solve the fluid limit. The results are then extended to the distance and acceptable 
rankings distributions. 

5.1 Complete rank distribution 

Assumption [T] is not enough for solving ^ in the case of geometric or random acyclic preferences. Therefore, 
we propose this additional assumption: 

Assumption 2 For geometric and random acyclic preferences, the following approximations hold: 

• Dr.{K) is independent of i (and therefore denoted Dji{K)); 

• the complete ranking is symmetric: Ri{j) — Rj{i). 



The flrst approximation just states that in average, all nodes have the same mate distribution, while the 
second one tells that Ri{j) is a good approximation of Rj{i)- These approximations were motivated by the uni- 
form distributions used for shaping the preferences. In particular, they do not apply for node-based preferences, 
where the mate distribution is strongly affected by a node's mark. Under these assumptions, we get 

Dr{K) = pSUK), with Sr{K) ^l-J2 ^«(^)- (10) 

L=l 

This equation gives an immediate recursion for Sr: 

^^(^) " { Sb.{K - 1) - pSliK - 1) otherwise. ^^^^ 
In return, Dr is directly given by Dji{K) — Sii{K) — Sr{K + 1). 



5.1.1 Fluid limit 

We now give the fluid limit of Dr. The scaled version of Dr is defined like for node-based preferences, except 
that there now only one parameter. For < a < 1, we define VR{a) :— [N — 1)Dr{\_{N — l)a\\ + 1). The 
scaling factor is now — 1 because it is the upper bound for K (while A^ was the upper bound for i,j in Q). 

K 

Dr can be expressed as an integral of T>r: Dr{K) = j kIi 'Dr{x) dx. The scaled CCDF, Sr, is then naturally 
defined as: 

/•a 

SR{a) = 1 - / 'Dr{x) dx. 
Jo 

Theorem 2 We assume that d = p{N — 1) is a positive constant. As N oo, Sr uniformly converges towards 

SM-^,. (12) 
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In particular, the probability that a node has no mate in the stable configuration is = and a good 

approximation for Sr{K) is 



Sketch of proof: The proof is a simpler version of the proof of Theorem [T] (cf Appendix [Aj . First we prove 
that the Vn functions are uniformly Cauchy (but in this case there is only one variable and there is no need 
for a continuous extension). This proves the uniform convergence towards Soo- Then we deduce from (fTTI) a 
differential equation verified by Soo- 

-Soo{a)=dSUa), (14) 
with the boundary condition SuiO) = 1. The resolution of l|14p gives (fT2l) . which completes the proof. □ 



5.1.2 Validation 

Contrary to the case of node-based preferences, the mean field formula lfTO|) has not been vahdated in a previous 
work, so we could not compare the fiuid hmit with it, and used simulation^ We considered random acychc 
instances, and geometric preferences in a 1-dimensional torus and in a 6-dimensional torus. N was set to 50 or 
2000. We used 3 values of p: I, jq and j^. For each set of parameters, the empirical distribution was calculated 
over 100 instances. The results are shown in Figure [H 

For p = 1 (Figures [2al and [2bl) . the mean-field assumptions hardly hold. As a consequence, the curves 
depend of the type of preferences, and the fiuid hmit is not accurate. This is especiahy visible if K is close to 
the boundaries (that is 1 or N). In particular, the non-mate probability is clearly over-estimated. However, the 
fluid limit manages to give the 0{^) behavior that is common to all considered preferences. From that point 
of view, the fluid limit performs better than the recursive equation ifTTj) . which gives Sii{K) = 6\ for p = 1. 

Forp — (Figures[2cland [2dl) . the curves are nearly indistinguishable. We verify that all types of preferences 
(acychc or geometric) tend to have the same behavior and that Theorem [2] gives precise approximations. 

For p = (Figures [2e] and [2j) , the curves are indistinguishable. 

We conclude that fluid-limit based on the mean-fleld formula is very effective for computing the complete 
ranking distribution, even if N is not very large and p is not very small. 



5.2 Distance distribution 

For geometric preferences, the actual distance between a node and its mate may be a more valuable performance 
indicator than the ranking. We cah Sxix) the probability that the distance between a node i and its mate C(i) 
is not less than x (in other words, the distance is greater than a: or j is unmated). Under the fluid limit, we get 
a good estimate of Sx ■ 

aBn[x) + 1 

where i?„ is the size of a bah of radius x in the n-torus. 

Proof: In the fluid hmit, a bah of radius x contains NBn{x) nodes, because it occupies a ratio Bn{x) of the 
torus. Therefore the farest node in a x-bah centered at a node i should have a complete rank NBn{x) for i, 
while being at a distance x from i. We deduce that Sxix) = SR{NBn{x)). Equation (flSl) concludes. □ 

The value of Bn{x) depends on n and on the norm used. If we conside the maximum norm, then Bn(x) = 
min((2a;)", 1). For other norms, the formula may be more comphcated because the bah may partiahy overlap 
itself in the torus. Note, that if we choose M" (with uniform point distribution) instead of the n-torus, i?„(x) 
is just the size of a bah of radius x. 

Figure [3] shows Sx for n = 1 and n — 3, with the taxicab norm. With this norm, we have Bi(x) — min(2a;, 1) 
and 

if < a; < i. 



3 -4(2;- i)3) if i < a; < 1, 



3^ 2 



1 - Id -a;)3 if 1 < a; < §, 



B^{x) -- 

We used N = 2000 and p — and the fluid hmit and empirical distribution of Sx were indistinguishable. 



3^2 

1 if x > i 



Actually, we did validate the mean field formula, but our results are to be published. 
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Figure 2: Empirical validation of the fluid limit. 



5.3 Acceptable rank distribution 

Now we want to investigate the probability that the mate of a node has an acceptable rank k. We call Dr{k) 
this probability. Like for the other distributions, we introduce the CCDF Sr{k) := 1 — '}2i=i Dr{k). 

Following the complete ranking method, we consider the conditions for a node i to be mated with its fc*'' 
best neighbor j = r^^{k): 

• i must have k neighbors or more, 

• it must not be mated with someone better than j, 
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Distance x 



Figure 3: Distance distribution {N — 2000,p ~ thq)- 
• j must not be mated with someone better than i. 

With the acceptance ranking, there is intrinsic correlations between these events that compHcates things. 
Despite of that, assuming that these events are independent allows us to give a first, non-accurate, recursive 
formula: 

DAk)^SAk)'-^^-^f-^±^, (16) 

K + 1 

where Ix is the regularized incomplete beta function. 

Proof: i has k neighbors or more with probability 1 — /i_p(n — A; + 1, k). The probability that i is not with 
better than j is Sr{k). For the reciprocal, we can use K — ^ as a (very rough) approximation of the complete 

rank; then Equation (fT3|) gives the probability -^^ri- Formula lfT6|) follows. □ 
The results are shown in Figure [H One can observe that Equation (fT6|) is not accurate for Dr{l), which 
provokes a gap between the empirical distribution and the formula. 



10 



-2 



Empirical distribution 
Fluid limit 
Adjusted fluid limit 



10 



10 

AcceDtable rank k 



Figure 4: Acceptable ranking CCDF {N = 2000,p -^). 



In an attempt to adjust the formula, we propose a more accurate estimation of Dr{l): under the normalized 
fluid Hmit, the scaled rank of the first neighbor j of a given peer i follows the distribution de"'*". j and i are 
mate if j is mated with someone with a scaled rank greater or equal to a, which happens with probability ^jjqrf. 
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Thus we have 



A.(i)=/; 




(17) 



cEi{l) « 0.596 



0.75 




1 -torus 
3-torus 



0.7 



^ 0.65 



0.6 -^^^ r^^T^r^?^ 



0.55 L 
10 



-2 



10" 

p 



-1 




Figure 5: L»r(l) as a function of p (iV = 2000). 



The accuracy of Dr(l) — ei?i(l) denotes the exponential integral) is verified in Figure [H If we use this 
value for adjusting the fiuid limit, we get a better estimation of Sr for small values of k (cf Figure |4]). However, 
this adjustment introduces a gap for larger values of k. In a further version of this paper, we will aim at unifying 
these two estimates, which will require a better understanding of the correlations that occur when considering 
the acceptable rank. 

6 6-matching generalization 

We now extend our results to the case of multiple matchings. For simpHcity, we consider here that the quota 
vector 6 is a scalar, i.e. that all nodes share the same number of authorized collaborations. For distance and 
acyclic preferences, we focus on the complete rank, although distance and acceptable ranking could be derived 
using the same techniques than for b — 1. 

6.1 Mean Field formulas 

A peer can now have up to b mates. For I < c <b, Dc denotes the distribution of the complete ranking of the 
c*'' best mate, and Sc denotes the corresponding CCDF. Like we did for 6 = 1, we can give the conditions for a 
node j = R~^{K) to be the c*'' mate of a node i: 

• is an acceptable edge, 

• the (c — 1)*'* mate of i (if c > 1) is better than j, but the c*^ (if any) is not, 

• the 6*'' mate of j (if any) is not better than i. 

By extending Assumption [H we obtain a generic mean field formula for multiple matchings: 




(18) 



Like for the simple matching case, this formula can be adapted to specific preferences. 
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We first consider node-based preferences. Dc{i,j) being the probability that the c*'' mate of i is j, we have 
the following system, which can be solved by a double iteration on i and j (cf [7]) 

r oifz = j, 

Dc{i,j)^< pSi{t,j)Sb{j,i)i{tj^j,c^l, (19) 
[ p{Sc{i,i) - Sc~i{i,j))Si,{j,i) ifij^j,c> 1. 

Then, for acyclic and distance-based preferences, we also extend Assumption [2] (homogeneity of the distri- 
butions and symmetry of the complete ranking). This gives the following system: 

n (K^ - f pSR,i{K)SR^biK) if c = 1, , , 

^fl,clA j - I _ Sn,,_,{K))SRAK) if c> 1. ^^"^ 

Using Sb.,c{^) — 1 and Dr^{K) — Sb.,c{K) — Sr,c{K + 1), Equation l(20|) immediately gives an iterative 
computation of Sr^c ■ 

Figure [6] shows Sc{i,j) (node-based) and Sr^c (acyclic/geometric) as obtained by l|19p and ((20l) . The pa- 
rameters are b G {2, 3, 4}, iV = 2000, p — j^, and i — 1001 (for Sc{i,j)). We verified for each set of parameters 
that the curves coincide with the empirical distribution. S and Sr (CCDF for 6=1) are also plotted for serving 
as a landmarks. We see that the curves have a behavior that is similar than for the simple matching case: for 
node-based preferences, it seems that the distribution Dc{i, ■) are still exponentially decreasing, even if seems 
that there is now offsets between the distribution peaks and i. For acyclic and geometric preferences, we still 
observe a kind of power law behavior. 



6.2 Fluid limits 

Fluid limits also exist for 6 > 1. We will not present the proofs in this paper, because they are essentially the 
same that the uniformly Cauchy proofs for the simple matching fiuid limits, only more complex to write because 
of the multiple distribution involved. Therefore we just give the equations verified by the limits. 
For node-based preferences, the scaled Hmit S of the CCDF verifies: 

„ J -d5i(Q;,/3)5b(/3, a) for c = 1, otherwise , , 

with border conditions 5c(a,0) = 1. 

Similarly, for acyclic and distance-based preferences, the scaled limit Sr, of the CCDF verifies 

^ f -d5fl,i5i?,fc if c= 1, , , 

'^^•^ - \ -d{SR,, - Sr,,^^)Sr,, if c> 1, ^^^> 

with the boundary condition Sr,c{Q) = 1. 

There is no simple explicit solution for Equations l(2T|) and (|22l) . However, ifTOj) and (|20)) can still be used 
as difference equations to approximate a numerical solution. The reason for which we give these Hmits is 
that we think that they can give us valuable information about the asymptotical behavior of the distribution 
(exponentially decreasing or power law), even if this work is still to be done. 



6.3 Discussion 

6.3.1 Stratification trade-ofF 

As we have seen, for node-based preferences, the mates of a given peer i have, in average, the same rank than i. 
This is the stratification effect (|7j), which guarantees a some fairness in the stable configuration: the expected 
gain of a node tends to be the value offered by this node, measured in term of ranking. However, we also 
observed that the exponential decreases of the Vc{i, ■) functions provokes a standard deviation of the same 
order that where d is the average degree in the acceptance graph. This gives the following stratification 
trade-off: 

• if d is too small, the standard deviation is high. In particular, if the mark matrix is non uniformly 
distributed, there can be a big difference between the expected gain and gift, measured with the marks. 
This issue has been highlighted in [7j for explaining a possible workaround of BitTorrent's Tit-for-Tat 
poHcy; 
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j Absolute Rank K 



(c) 6 = 4 

Figure 6: Complete rank CCDFs for 6 > 1. Node-based (resp. acyclic/geometric) distributions are on the left 
(resp. right) side. TV = 2000, p = and z = 1001 (for Sa{i,j) 



• on the other hand, a high d will enforce the fairness. However, the size of the acceptance graph degree 
has a cost for the nodes (memory usage, overlay management,. . . ). Also, the absence of long-range mates 
makes the diameter of the stable configuration high, which can be problematic if messages are to be spread 
using stable edges. 

Note, that there is a similar trade off for 6, which is the maximal degree in the stable configuration. This 
suggests that most node-based preference systems (this includes the systems based on the sharing of an access 
bandwidth, a storage or CPU capacity, an expected uptime,. . . ) should admit an optimal pair {d, b) with respect 
to the stable collaborations properties, whose values depend on the weight put on the effects presented above. 

6.3.2 Small- World effect in geometric preferences 

A small world is a sparse graph with a low average shortest path length (ASPL) and a high clustering coefficient. 
In details: 
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Type of preferences 


ASPL 


Clustering CoefHcient 


1-torus 


7.4 


0.055 


2-torus 


6.7 


0.043 


Meridian 


6.1 


0.031 


3-torus 


5.9 


0.033 


4-torus 


5.1 


0.027 


Random Acyclic 


5 


0.0043 



Table 1: ASPL and clustering {N = 2000,p = = 10). 



• sparse graph means that the average degree is 0(1) or 0(log?i), 

• low ASPL means 0(log(n)), 

• high clustering coefficient means that two nodes sharing an edge are likely to have a common neighbor. 
The clustering coefficient is a probability, that must be compared to the clustering coefficient of a random 
graph with same number of nodes and edges. 

In [11] , Kleinberg proved that a rt-dimensional grid can be turned into a small world by adding long-range edges 
that follow a ^(^) distribution. 

For multiple matchings, the stable configuration in geometric preferences is likely to have a high clustering 
coefficient, because most of the stable edges Hnk close nodes. Moreover, the power-law rank distribution tells 
that long-range edges exist. So the stable configuration is likely to be ehgible as a small-world. 

In Table [1] we give the ASPL and clustering coefficient for some preferences, using the parameters TV = 
2000,p = jQ,b = 10. The reference clustering is here jy^j « 0.005. We verify that the for the n-tori, the stable 
configurations are small- worlds. On the other hand, like previously observed in [6], the stable configurations 
of random acyclic preferences are not small-worlds, because of their clustering coefficient (they behave Hke an 
incomplete 5- regular graph). 

We also calculated the ASPL and clustering obtained by using the Meridian Project's real-world latencies, 
which are known to produce small- worlds configuration [6]. One can observe that the results are very close 
to the one obtained with the tori. Interestingly, the closest results are those from the 3-torus, suggesting that 
somehow, 3 may be seen as sort of dimension for the latency space. Considering the recent eager for estimating 
the Internet dimension (see for instance [2]), this unexpected result is appeahng: it suggests that the stable 
configuration, which is only defined by how the nodes rank each other (latencies are used for sorting the nodes, 
but the actual values are never involved in the construction), could reveal valuable insight about the topology 
behind a set of distances. 

7 Conclusion 

We gave a statistical description of the stable configurations obtained from node-based preferences, distance- 
based preferences, and from random acyclic preferences. Starting from a generic formula for the rank distri- 
bution, we introduced mean-field and fiuid Hmit techniques in order to give expHcit formulas. All our results 
were validated by means of simulations. An interesting consequence of our results is that for distance-based 
preferences, the stable configurations behave similarly to Kleinberg's grids, and are small-world graphs. 
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A Proof of Theorem [T] 



The proof relies on the following steps: 

• we prove that the Vjy functions are uniformly Cauchy on [0, 1]^; 

• we use the Cauchy convergence to show that Sn and Vn have limits iSoo and Voo, and we give a PDE 
verified by Soo ; 

• we solve the PDE, and use the solution to get 2?oo- 
A.l Uniform convergence 

Let N be fixed, and Ni,N2 be two integers greater than N. The corresponding Erdos-Renyi probabilities are 
Pi = ^ and p2 = We consider the error function defined by 



For proving that D^v is uniformly Cauchy, we need to find a bound for E that applies for any (a, /?) G [0,1]^, 
and that tends towards as TV goes to infinity. 



Let ai, /3i, az, 1^2 be respectively J^TT^, Using © and ©, we have, for k G {1,2}, 



It would be nice to have a and (3 instead of au and /3fc, and V instead of T). In order to do that, we notice 
the following: 

• same for {Pk , x) ; 



• the same with a and /3 switched; 

• (a, x) is bounded by d and only differs from V^^ (a, x) for [A^/taJ [Nkx\ . It follows that | Pjv^ (a, x) — 



E{a, (3) = iPjVi {a, /3) - Vn, {a, /?) 



(23) 



Vn, (a, /3) - d{l~ /jf'= Pjv, (afc, a;) dx) x 
x(l-/;'= VN,Wk,x)dx) 



• < 1, so we have "Dn^ < NkPk — d. PiS a — ak < < , it follows that J2 '^n^. (/?, x)dx < 



• the same with a and switched. 



We deduce that 




and the same with a and /? switched. Then, if we call 




(24) 



we have 



(25) 



This gives us 



(26) 



d Sni {a, f3)SNi iP, a) - (", f3)SN2 (/?, «) 



Using the definition of Sn^ , we see that 
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E(a,x)Ax+ I E{(i,x)Ax + 
Jo 

rP 



< 



/ i?(a,a;)da: 
Jo I 



E{P, x) dx 



Note, that both Vni (a, •) and 'f>N2 (c*j •) are probabiHties, so we can bound E{a, x) dx by 2 in the integral 
product. Then (|26l) becomes 



Eia, P)< ^ + d(^Jll E{a, x)dx + 3 /g" E{I3, x)da 



(27) 



We now want to merge a and /? into a single variable. Therefore, we define F{j) := snp^^ij^^i^^^/^^^ E{a, (3). 
For any a < 1, /? < 1, 7 > a + /3, we have 

j^E{a,x)dx < J^F{a + x)dx 
<r+^F{x)dx 



<£f{x) dx, 



and the same for E{j3, x) dx. It follows that 

F{l) < 



16d^ 
N 



Ad F{x)dx 



It follows that ^"(7) < ^^^e'^''"'^ by Gronwall's lemma [3j. As a special case, for all a, /3 < 1, we have 

E{a,P)<F{2)<'-^e''. 



(28) 



(29) 



This concludes the proof that Pat is uniformly Cauchy. 



A.2 PDE 

As Vn is uniformly Cauchy on [0,1]^, it converges towards a function Voo- Using l(2^ . we deduce that Sn 
converges towards a continuous function Soo, and that — I?oo is the partial derivative of Soc with respect to its 
second variable. 

Then, if we make N go to infinity in l(25|) . we obtain the PDE verified by Soo- 

dySooia, P) ^ -dSoc{a, (3)SooiP,a), (30) 

with limit condition iSoo(q:,0) = 1. 

Notice that ((30|) proves that I?oo is continuous. 



A. 3 Resolution 

Note, that for a = 0, ^ immediately gives 5oo(0, /?) = e''^'^ . 

To go further, we introduce the auxiliary function f{a, (3) := log( g°°|f '^j )• 
/ is skew-symmetric. Its first partial derivative is: 

^ r( _ d^^Soo{a,(3) _ dySoo{(3,a) 

"^^"'^^ " 5(a,/3) SiP,a) 

5 a,/3 
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By differentiating again, we get the mixed derivative 

dxySocia.P) dxSooia, P)dySocia, f3) 



Soo{a,f3) (5oo(a,/3))' 
+ddySoc,{a, (3) 

dyxSoo{a,l3) ^ dxSoo{a, P)dSoo{li,a) 

5oo(a,/3) 5oo(a,/3) 
+ddySoo{a, 

dSao{a, f3)dySac{P, a) 
5oo(q!,/3) 

{=dy,f{a,(3)) 



■ ddySoo{a,(3) 



The only global solutions to the wave equation fxy = are those of the form /(a, /3) = a(a) + 6(/3) (see |14) . 
for instance). Given that / is skew-symmetric, the solution is indeed of the form /(a,/3) = a{a) — a{/3). The 
border conditions immediately give /(a, /3) = d{P — a). 

We deduce S{f3,a) = S{a, l3)e'^^^-°'\ 

If we treat Soq as a function of (3 with a as parameter, equation i(30l) becomes 



= -d52(/3)e'^('^-") (31) 
From there, one get Soo{a, P) = p^^^d{p-c) ■ Given that 5oo(a, 0) = 1, the solution is: 

Using Poo = —dySoo, one get ([8]). This concludes the proof of Theorem [TJ 
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B Exact resolution of the node-based stable configuration 
B.l Recursive formula 

For 6 = 1, we can give an explicit recursive formula for D{i,j). The first step is to compute D{l,k), for 
2 < k < n. As 1 is the best node, it can choose the best of its neighbors, so D{1, k) is the probability that k 
is the best of I's neighbors. In other words, this is the probability that k is acceptable for 1, while all nodes / 
with 1 <l < k are not. This gives us 

D{l,k)^p{\-pf~\ (32) 

Now, we consider two nodes i and j such that \ < i < j < n. D{i,j) — P{i j) can be calculated with a 
proper conditionning on the mate k (if any) of 1. The key is to notice that if 1 is mated with fc, the both of 
them can be virtually removed from the graph. The remaining graph is still Erdos-Renyi and the probabilities 
are the same up to a slight relabeling: 

• if k — i or k — j, then i cannot be mated with j; 

• if 1 < k < i, i and j can be virtually relabeled i — 2 and j — 2 (cf Figure [7a|) . so we have P(i ^ j\l < k < 

z) = P((z-2)^(j-2)); 

• if i < k < j, i and j can be virtually relabeled i — 1 and j — 2 (cf Figure iTbl). so we have P{i ^ j\i < k < 
j) = P((*-l)-(j-2)); 

• if 1 is not mated or k > j (notation: k ^ j), i and j can be virtually relabeled i — 1 and j — 1 (cf Figure [7c|) . 
so we have P{i ^ j\k ^ j) — P{{i — 1) ^ (j — 1)). 




(c) k>j 



Figure 7: Using the mate of 1 to deduce D{i,j). 



Under this conditioning, we get 



D{i,j)^ P{i ^ j\l< k < i)P{l < k < i) 

+P{i ^ j\i < k < j)P{i < k < j) (33) 
+P{i^j\kij)P{kij). 



This leads to the following formula for D: 

D{i,j)^ {l-S{l,i))D{t-2,j-2) 

+ {S{l,i + 1) - S{l,j))D{t 2) (34) 

+S{\,j + \)D{i-l,j-l). 

From (IHll), we have S{1, fc) = (1 - p)^^"^ . This gives 

D{i,j) = A{i)D{i - 2, J - 2) + B(i,])D(i - 1, j - 2) + C{j)D{i - 1, j - 1), with 

A{i)= \-{1-pY-^ 

B{i,j)^ {I - py-' - {1 - py-^ ^'^^> 
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Now, in order to give a fluid limit, it can be convenient to reduce[34]to an expression of the complementary 
cumulative distribution S. Using the definition S{i,j) — J2i^j ^(*' Equation[35]becomes, after simplification, 

S{i,j) = Aii)Sii - 2, J - 2) + B{t,j)S{i - 1, J - 2) + C'U)S{i 1), with 

B.2 Uniform convergence 

Like for the mean formula, we can prove that the scaling Pjvla,/?) = ND{[Na\ +1 + 1, [Nf3\ + 1 + 1) is 
uniformly Cauchy . The sketch of proof is the same: clean the boundary of the integrals and the other O( j^) 
offsets, then use an auxiliary error variable 7 and use Gronwall's lemma to conclude. This guarantees the 
convergence of Pat and Sn- 

B.3 PDE 

We will use the fact that if we use the scaling i — > [Na\ + 1, j [NP\ + 1, then 



A{i) converges towards 1 — e 



— da 



B{i,j) converges towards e~ — e~ 
• C{j) and C2{j) both converge towards e"'^^ . 

The first step is to translate l(36|) into an expression of Sn ■ with a — ^ and f3 — , we obtain 

SNia, P) = Aii)SN{a + j)5jv(« - ^, /? - 1) + C2(j)5Ar(a - ^, /? - (37) 

We notice that {A + B + C){i,j) = 1 -p(l If we remove {A + B + C){i, j)SN{a, [3 - j^) from each 

side of l(36|) . and multiply the result by N: 



the left part becomes 



which converges as TV — > cx) towards 



the right part becomes 



2^ + de-''-S^, 
op 



A{i)NiSNia - A, /J _ ^) _ s^^a, (3 - ^)) +(i?(i, j) + C{j))N{SN{a - i, /? - ^) - 5jv(«, /? - ^)) 

+C{j)N{SN{a - i,/3 - i) - SN{a - ^, /? - ^)) 

which converges as, N ^ 00 towards 



da da 8(3 

So after scaling, the recursive equation is now: 

2^ + de-'^-S^ = -2(1 - e-'^")^ - e"'^"^ + e^'^^^-^. 
op Oa oa op 

In other words, S verifies the PDE: 

(2 - e-'^")^ + (2 - e-'^'^)^ + de-'^^S^o = 0. 
oa Op 
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Theorem 3 With the border condition iSoo(0, (3) — e ''■^ , the unique solution of this PDE is Soo = i_f,-da\f,d{i3-c) ■ 
The scaled version of D, denoted Voo, thus verifies: 



dp ^ '^^ (l_e-da_^gd(/3-a))2- 

Proof: Let us change the variables: put x = e'^" and y — e'^^ . We also make the PDE more symmetric by 
multiplying by x. Define u by putting Soo(ot,l3) = xu{x,y). The PDE then becomes: 

(2 )(dx^— + dxu) + (2 )xdy— + d-xu = 0. 

X ox y Oy X 

ie: 

r(2x-l)f + (2y-l)|| + 2zi = 

This equation is a non-linear first order PDE: F{Du, u, x) — 0, where F is hnear. To solve this PDE, we use 
the classical method of characteristics described in [5], chapter 3. Let X{s) = {x{x),y{s)) {s in an interval of 
ffi), be a trajectory in the base space; define p{s) = Du{X{s)) and z{s) = u{X{s)). Then, solving the equation 
F{p{s), z{s), S{s)) = leads to the equivalent system of ODE (we forget about p{s), which is not required to 
solve the PDE with boundary condition, see [5] p 100 for further precisions): 

x{s) = 2x{s) - 1 
y{s) = 2y{.s) - 1 
i(s) = -2z{s) 
^zo = z{xo := l,yo) = ^ 

where ' stands for 

These 3 ODEs are with separable variables (Cauchy-Lipschitz theorem applies for existence and unicity). 
The solution with the boundary condition at s = 0, xq — 1, yo £ R, zq — ^ is: 

2x{s) - 1 = e^" 

2yis) - 1 = (2yo - l)e^' 

z(s) = ie-2« 
^ ' ya 

Now given (x, y), we deduce s such that x{s) — x and y{s) — y then yo and z{s) — u{x, y): 2yo — 1 = 2x-i 
then 

u{x,y) ^ ^ ^ 



yo2x-l x + y -1 

Replacing x and y by e''" and e"^^ concludes the proof. □ 
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